Skip to content

feat: arch-aware downloads in create-build and fetch-busybox target#2260

Merged
tomassrnka merged 3 commits intomainfrom
feat/create-build-arm64
Mar 31, 2026
Merged

feat: arch-aware downloads in create-build and fetch-busybox target#2260
tomassrnka merged 3 commits intomainfrom
feat/create-build-arm64

Conversation

@tomassrnka
Copy link
Copy Markdown
Member

Summary

  • Update create-build (setupKernel/setupFC) to use arch-prefixed download URLs (e.g. vmlinux-<version>-arm64) with automatic legacy fallback for backward compatibility
  • Add errNotFound sentinel error for 404 handling in the download helper, and use atomic temp-file writes to avoid partial downloads on failure
  • Add fetch-busybox Makefile target that swaps the embedded amd64 busybox binary for an arm64 one when building on ARM64 hosts (used as a dependency for build-local and build-debug)

Depends on: #2258 (target arch path resolution)
Part of: #1875 (ARM64 support)

Test plan

  • Verify make build-local on amd64 host still works (fetch-busybox should no-op with "Using bundled amd64 busybox")
  • Verify make build-local on arm64 host triggers busybox swap
  • Verify create-build downloads arch-prefixed kernel/FC binaries when available
  • Verify create-build falls back to legacy (non-prefixed) URLs when arch-prefixed files return 404

🤖 Generated with Claude Code

@cursor
Copy link
Copy Markdown

cursor bot commented Mar 29, 2026

PR Summary

Medium Risk
Changes affect build artifact fetching and on-disk layout, so incorrect arch resolution or URL availability could break build creation on some platforms; logic is bounded and includes explicit 404 handling and amd64-only legacy fallback.

Overview
create-build now resolves the target architecture and downloads kernel and Firecracker binaries from arch-scoped GCS paths, falling back to legacy non-arch URLs only for amd64 when a 404 occurs. The download helper now treats 404 as a sentinel errNotFound and writes via a temporary file + atomic rename to avoid leaving partial artifacts. BusyBox embedding was split into amd64 and arm64 build-tagged sources so ARM builds use an ARM64 binary instead of the bundled AMD64 one.

Written by Cursor Bugbot for commit 604cacb. This will update automatically on new commits. Configure here.

@tomassrnka tomassrnka changed the base branch from feat/target-arch-path-resolution to main March 29, 2026 15:14
.PHONY: fetch-busybox
fetch-busybox:
@ARCH=$$(go env GOARCH); \
BUSYBOX_TARGET=./pkg/template/build/core/systeminit/busybox_1.36.1-2; \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: arch mismatch between fetch-busybox and build-local/build-debug

fetch-busybox detects the host arch via go env GOARCH and replaces the embedded busybox with an arm64 binary when running on an arm64 host. However, build-local and build-debug both hardcode GOARCH=amd64, so on an arm64 host the resulting amd64 orchestrator binary will embed the arm64 busybox. This will break on amd64 VMs.

The arch check here should use the target build architecture (always amd64 in the current build targets), not the host's go env GOARCH. One option: pass the target arch explicitly, e.g. FETCH_BUSYBOX_TARGET_ARCH=amd64 make fetch-busybox, and have the target use that variable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by PR #2257 — once the Makefiles PR merges, build-local uses $(BUILD_ARCH) which defaults to go env GOARCH, matching fetch-busybox. On the current branch, build-local still hardcodes amd64, but the two PRs are designed to merge together.

echo "Fetching arm64 busybox via apt..."; \
TMPDIR=$$(mktemp -d); \
apt-get download busybox-static 2>/dev/null && \
dpkg-deb -x busybox-static_*.deb "$$TMPDIR" && \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get download busybox-static downloads the .deb into the current working directory. If a previous failed run left a busybox-static_*.deb file behind, the glob on line 144 (dpkg-deb -x busybox-static_*.deb "$$TMPDIR") will expand to multiple files — dpkg-deb -x expects exactly one package file plus an output directory, so it will mis-interpret the second .deb as the output dir and fail with a misleading error.

Fix: change directory into $$TMPDIR before downloading so the .deb lands there and the glob is naturally scoped:

TMPDIR=$$(mktemp -d); \
cd "$$TMPDIR" && apt-get download busybox-static 2>/dev/null && \
    dpkg-deb -x busybox-static_*.deb . && \
    cp bin/busybox "$$BUSYBOX_TARGET" ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case — the rm cleanup at the end of the target handles stale .deb files. In practice, apt-get download produces a deterministic filename so glob collision is extremely unlikely.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8701dbd011

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 24 to 27
build-local: fetch-busybox
# Allow for passing commit sha directly for docker builds
$(eval COMMIT_SHA ?= $(shell git rev-parse --short HEAD))
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o bin/orchestrator -ldflags "-X=main.commitSHA=$(COMMIT_SHA)" .
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep embedded busybox arch in sync with GOARCH

build-local/build-debug now always run fetch-busybox, which replaces the embedded busybox with an arm64 binary on arm64 hosts, but these targets still compile orchestrator with GOARCH=amd64. This can produce an amd64 orchestrator artifact that embeds an arm64 /usr/bin/busybox, and amd64 template builds using that artifact can fail at runtime with exec format error when init/busybox is invoked inside the guest.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by PR #2257 — same as above, BUILD_ARCH will match go env GOARCH once Makefiles PR merges.

Comment on lines 422 to +423
if _, err := os.Stat(dstPath); err == nil {
fmt.Printf("✓ Kernel %s exists\n", version)
fmt.Printf("✓ Kernel %s (%s) exists\n", version, arch)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse legacy local kernel cache before downloading

setupKernel now checks only the arch-prefixed destination ({version}/{arch}/vmlinux.bin) for an existing local file. On a reused local storage directory that still has the legacy flat layout ({version}/vmlinux.bin), this path misses, triggers a network download, and can fail in offline/restricted environments even though a valid local kernel already exists. The same regression pattern is present in setupFC for firecracker binaries.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor optimization — the legacy local cache at a different path layout doesn't help for the new arch-prefixed structure. Re-downloading is the correct behavior.

Comment on lines +490 to 510
legacyURL, err := url.JoinPath("https://storage.googleapis.com/e2b-prod-public-builds/fc-versions/", version, "firecracker")
if err != nil {
return fmt.Errorf("invalid Firecracker legacy URL: %w", err)
}

fmt.Printf(" %s path not found, trying legacy URL...\n", arch)

return download(ctx, fcURL, dstPath, 0o755)
return download(ctx, legacyURL, dstPath, 0o755)
}

func download(ctx context.Context, url, path string, perm os.FileMode) error {
req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
var errNotFound = errors.New("not found")

func download(ctx context.Context, rawURL, path string, perm os.FileMode) error {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, rawURL, nil)
if err != nil {
return fmt.Errorf("invalid download URL %s: %w", rawURL, err)
}

resp, err := (&http.Client{Timeout: 5 * time.Minute}).Do(req)
if err != nil {
return err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The legacy fallback in setupFC now points to GCS (storage.googleapis.com/e2b-prod-public-builds/fc-versions/{version}/firecracker) instead of the original GitHub releases URL, which was the authoritative source for existing Firecracker versions. If pre-existing FC versions are not yet mirrored to GCS at the flat path, both the arch-prefixed and legacy GCS lookups will 404, silently breaking make build-template for amd64 users where it previously succeeded. To fix, either retain the GitHub releases URL as the final fallback or confirm all existing FC versions are pre-populated in GCS before merging.

Extended reasoning...

What the bug is and how it manifests

The original setupFC function used GitHub releases as the sole download source:

fcURL := fmt.Sprintf("https://github.com/e2b-dev/fc-versions/releases/download/%s/firecracker", version)

This PR replaces it with a two-step GCS lookup: first https://storage.googleapis.com/e2b-prod-public-builds/fc-versions/{version}/{arch}/firecracker (arch-prefixed), then https://storage.googleapis.com/e2b-prod-public-builds/fc-versions/{version}/firecracker (flat path legacy fallback). Neither URL is the original GitHub releases URL. The "legacy fallback" does not fall back to the actual legacy source.

The specific code path that triggers it

In setupFC (main.go ~lines 490–530): on amd64, if the arch-prefixed GCS path returns 404, the code falls through to the flat GCS legacy URL. If that also returns 404, download() returns errNotFound wrapped in a non-nil error, and setupFC returns an error — failing the entire setupEnv call, which aborts make build-template.

Why existing code does not prevent it

The flat GCS path fc-versions/{version}/firecracker is a new path introduced by this PR; it did not exist before. The original flat path was GitHub releases. Whether old FC versions have been pre-mirrored to GCS at the flat path is an external infrastructure dependency that cannot be verified from the code alone. The smoke test at cmd/smoketest/smoke_test.go:366 still uses the original GitHub releases URL specifically for Firecracker downloads (while kernels at line 357 have already migrated to GCS), which is strong circumstantial evidence that the GCS flat path for FC versions may not be fully populated.

Impact

Any developer running make build-template (or go run cmd/create-build/main.go -storage ./...) on amd64 with a Firecracker version not yet mirrored to GCS at the flat path will get a fatal error instead of a successful download. This breaks the local development workflow documented in the Makefile.

Addressing the refutation

The refutation argues this is intentional and a deployment concern, not a code bug. However, the code explicitly labels the flat GCS path as a legacy fallback — implying it exists to handle pre-existing versions. A legacy fallback that does not cover the actual legacy source (GitHub releases) fails its stated purpose. The correct fix — retaining GitHub as a final fallback or ensuring GCS is fully populated — should be resolved before merging. The PR description itself lists "Verify create-build falls back to legacy (non-prefixed) URLs when arch-prefixed files return 404" as an unverified test plan item, acknowledging the risk.

Step-by-step proof

  1. Developer runs make build-template FIRECRACKER_VERSION=v1.7.0-dev_8bb88311 ... on an amd64 machine.
  2. setupFC computes arch = "amd64", dstPath = ".../fc-versions/v1.7.0-dev_8bb88311/amd64/firecracker".
  3. File does not exist locally, so download is attempted.
  4. Primary GCS URL https://storage.googleapis.com/.../fc-versions/v1.7.0-dev_8bb88311/amd64/firecracker → 404 (version not yet arch-mirrored to GCS).
  5. errors.Is(err, errNotFound) is true; falls through to legacy path.
  6. Legacy GCS URL https://storage.googleapis.com/.../fc-versions/v1.7.0-dev_8bb88311/firecracker → 404 (this flat GCS path was never populated; the canonical source was GitHub releases).
  7. setupFC returns error; setupEnv propagates; make build-template fails.
  8. Before this PR, step 3 would have fetched from https://github.com/e2b-dev/fc-versions/releases/download/v1.7.0-dev_8bb88311/firecracker and succeeded.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — GCS (storage.googleapis.com/e2b-prod-public-builds) is the canonical artifact source for both kernels and Firecracker. The GitHub releases URL was a convenience that may not have all versions. The new code tries the arch-prefixed GCS URL first, then falls back to the legacy GCS flat path for amd64 only.

Comment on lines 24 to 34
build-local: fetch-busybox
# Allow for passing commit sha directly for docker builds
$(eval COMMIT_SHA ?= $(shell git rev-parse --short HEAD))
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o bin/orchestrator -ldflags "-X=main.commitSHA=$(COMMIT_SHA)" .
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o bin/clean-nfs-cache -ldflags "-X=main.commitSHA=$(COMMIT_SHA)" ./cmd/clean-nfs-cache

.PHONY: build-debug
build-debug:
build-debug: fetch-busybox
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -race -gcflags=all="-N -l" -o bin/orchestrator .

.PHONY: run-debug
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The fetch-busybox target detects host architecture via go env GOARCH, but build-local and build-debug unconditionally cross-compile with GOARCH=amd64; on an ARM64 developer machine this causes an arm64 busybox to be embedded into the amd64 orchestrator binary. At runtime, the amd64 orchestrator writes this arm64 binary as /usr/bin/init inside every Firecracker VM rootfs, causing VM boot to fail with an exec format error. The fix is to make fetch-busybox respect the build target arch (always amd64 for these targets) rather than the host arch, e.g. by passing TARGET_ARCH=amd64 or a Makefile variable to the target.

Extended reasoning...

The bug: arch mismatch between busybox replacement and compile target

The new fetch-busybox Makefile target (lines 128–156) detects the host architecture using go env GOARCH and, when on an ARM64 host, replaces pkg/template/build/core/systeminit/busybox_1.36.1-2 with an arm64 static binary. The target was added as a prerequisite to both build-local and build-debug.

The specific code path

build-local and build-debug both hard-code GOARCH=amd64 in their go build invocations. Critically, Makefile prerequisites run in their own shell invocation before the recipe body, so the inline GOARCH=amd64 in the build recipe does not propagate to fetch-busybox. The prerequisite shell sees the unmodified environment, where go env GOARCH returns the host arch — arm64 on an ARM64 machine.

Why existing safeguards don't prevent it

systeminit/busybox.go uses //go:embed busybox_1.36.1-2 to embed whatever binary is present at that path at compile time. There is no per-arch branching in the embed directive — it blindly embeds whatever file exists. The go build invocation does not check whether the embedded binary's ELF architecture matches GOARCH.

Impact

On an ARM64 developer machine, make build-local or make build-debug produces an amd64 orchestrator binary containing an arm64 busybox as systeminit.BusyboxBinary. At runtime, rootfs.go lines 210 and 213 write this binary to usr/bin/busybox and usr/bin/init inside every Firecracker VM rootfs. When the amd64 Firecracker VM attempts to exec /usr/bin/init, the kernel returns ENOEXEC (exec format error), causing every sandbox to fail at boot.

Step-by-step proof

  1. Developer is on an ARM64 machine (e.g., Apple M-series under Linux, or an ARM64 CI runner).
  2. Developer runs make build-local.
  3. Make resolves the fetch-busybox prerequisite first. Its shell runs go env GOARCHarm64.
  4. The condition [ "$$ARCH" \!= "arm64" ] is false, so the target proceeds to replace busybox_1.36.1-2 with an arm64 static binary (from the host busybox or via apt-get download busybox-static).
  5. build-local's recipe runs: CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build .... The Go compiler processes //go:embed busybox_1.36.1-2 and embeds the now-arm64 file into the amd64 binary.
  6. The resulting bin/orchestrator is an amd64 ELF that internally carries an arm64 ELF as BusyboxBinary.
  7. When this orchestrator runs and creates a Firecracker VM, it writes the arm64 busybox to the VM's rootfs as /usr/bin/init.
  8. The amd64 VM kernel tries to exec /usr/bin/init → exec format error → VM fails to boot → every sandbox creation fails.

Fix

Pass the build target arch explicitly to fetch-busybox, e.g. add TARGET_ARCH ?= amd64 and have fetch-busybox use TARGET_ARCH (or a dedicated Makefile variable) instead of go env GOARCH. Alternatively, change build-local/build-debug to call make fetch-busybox TARGET_ARCH=amd64 as a prerequisite. Since these targets always build for amd64, fetch-busybox should never replace the bundled amd64 busybox when building for them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by PR #2257 — build-local will use $(BUILD_ARCH) instead of hardcoded amd64 once the Makefiles PR merges.

@tomassrnka tomassrnka force-pushed the feat/create-build-arm64 branch from 8701dbd to 96ed56c Compare March 30, 2026 07:30
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@jakubno jakubno assigned jakubno and unassigned ValentaTomas Mar 30, 2026
return fmt.Errorf("failed to download kernel: %w", err)
}

// Legacy URLs are x86_64-only; only fall back for amd64.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in another PR, not really needed

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept — confirmed needed for v1.10.

-kernel $(KERNEL_VERSION) \
-firecracker $(FIRECRACKER_VERSION)

.PHONY: fetch-busybox
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned earlier, can we build it ourself and make it consistent for both arch?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with committed multi-arch binaries using Go build tags. TODO in the arm64.go file to rebuild both from the same minimal config in the future.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the version in the name? could you also add source to the comment? 🙏🏻

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — renamed to busybox_1.36.1-2_amd64 / busybox_1.36.1-2_arm64 with source comments. The amd64 binary is a custom musl build (origin unknown, added in #1002). The arm64 binary is from Debian busybox-static 1:1.36.1-9. Added TODO to rebuild both from the same config.

e2b and others added 3 commits March 30, 2026 19:18
- Update setupKernel/setupFC to use arch-prefixed URLs with legacy fallback
- Add errNotFound sentinel for 404 handling in download helper
- Use atomic temp-file writes to avoid partial downloads
- Add fetch-busybox Makefile target to swap busybox binary for ARM64 builds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the single x86-64 busybox binary + fetch-busybox Makefile hack
with two architecture-specific binaries selected via Go build tags.
GOARCH automatically picks the right binary at compile time.

- busybox_amd64: existing x86-64 static binary (renamed)
- busybox_arm64: aarch64 static binary
- busybox_amd64.go / busybox_arm64.go: build-tag-gated embeds
- Remove fetch-busybox Makefile target (no longer needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per review: keep version in filename (busybox_1.36.1-2_{arch}) and
document the package source in the Go embed files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tomassrnka tomassrnka force-pushed the feat/create-build-arm64 branch from 775021b to 604cacb Compare March 31, 2026 07:41
@tomassrnka tomassrnka merged commit f88976c into main Mar 31, 2026
65 of 66 checks passed
@tomassrnka tomassrnka deleted the feat/create-build-arm64 branch March 31, 2026 10:33
drarijitdas added a commit to drarijitdas/infra that referenced this pull request Mar 31, 2026
Syncs all application code from e2b-dev/infra upstream including:
- Redis PubSub for state transitions (e2b-dev#2099)
- Pluggable egress firewall (e2b-dev#2187)
- Firecracker v1.12 upgrade (e2b-dev#2245)
- Label-based sandbox scheduler (e2b-dev#2066)
- Orchestrator internal/ -> pkg/ migration
- Pre-compute cgroup CPU deltas (e2b-dev#2265)
- Arch-aware downloads (e2b-dev#2258, e2b-dev#2260)
- Customizable pre-warmed NBDs (e2b-dev#2266)
- Autoresume improvements (e2b-dev#1969, e2b-dev#2196)
- Many bug fixes for race conditions, eviction, error handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants